Pre-processing Large Resources for Family Names Research

نویسنده

  • Adam Rambousek
چکیده

This paper describes methodology and tools used to preprocess historical archive documents in various formats and their conversion to unified format. Resources were used to investigate the origins and geographical distribution of surnames in the United Kingdom, as part of the Family Names in Britain and Ireland research project. Data extracted from the documents and their connection proved to be valuable research resource which helped to speed up the lexicographic work.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Protein Name Tagging for Biomedical Annotation in Text

We explore the use of morphological analysis as preprocessing for protein name tagging. Our method finds protein names by chunking based on a morpheme, the smallest unit determined by the morphological analysis. This helps to recognize the exact boundaries of protein names. Moreover, our morphological analyzer can deal with compounds. This offers a simple way to adapt name descriptions from bio...

متن کامل

Throne Name in the Achaemenid period

The Achaemenid kings after Darius I elected Darius, Xerxes, and Artaxerxes as their throne name, when they were nominating or substituting for succession. Each of these kings has chosen one of these names according to what happen for they before they reached the king's throne, how to achieve the throne and based on their design and program. These names are not personal and real names, but they ...

متن کامل

The Statistical Analysis of Family Names of Donators For WenChuan

It is analyzed that the family names of personal donators who donated through China Construction Bank Corporation to Chinese Red Cross Foundation for the WenChuan earthquake. The distribution of family names, the first 100 family names and their shares, the probability of the same family names as well as the Gini coefficient are all given in this paper. A heavy disproportion is showed in the di...

متن کامل

Genenames.org: the HGNC and VGNC resources in 2017

The HUGO Gene Nomenclature Committee (HGNC) based at the European Bioinformatics Institute (EMBL-EBI) assigns unique symbols and names to human genes. Currently the HGNC database contains almost 40 000 approved gene symbols, over 19 000 of which represent protein-coding genes. In addition to naming genomic loci we manually curate genes into family sets based on shared characteristics such as ho...

متن کامل

Online Processing Redux

The term \online" has become an all-too-common addendum to database system names of the day. In this article we reexamine the notion of processing queries online. We distinguish between online processing and preprocessing, and argue that online processing for large queries requires redesigning major portions of a database system. We highlight pressing applications for truly online processing, a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016